Weld: Fast Data-Parallel Computation on Modern Hardware

نویسنده

James J. Thomas

چکیده

Modern hardware is difficult to use efficiently, requiring complex optimizations like vectorization, loop blocking and load balancing to get good performance. As a result, many widely used data processing systems fall well short of peak hardware performance. We have developed Weld, an intermediate language and runtime that can run data-parallel computations efficiently on modern hardware. The core of Weld is a novel intermediate language (IL) that is expressive enough to capture common data-parallel applications (e.g., SQL, graph analytics and machine learning) while being easy to parallelize on modern hardware, through the use of a simple “parallel builder” abstraction and nested parallel loops. Weld supports complex optimizations like vectorization and loop blocking, as well as a multicore CPU backend. Finally, Weld’s runtime can to optimize across library functions used in the same program, enabling further speedups that are not possible with today’s disjoint libraries. In this thesis, we describe the Weld IL and then turn to the multicore CPU backend, providing a theoretical analysis suggesting that it has low overheads and showing that microbenchmarks and real-word applications like TensorFlow have excellent multicore performance when ported to run on Weld. Thesis Supervisor: Matei A. Zaharia Title: Assistant Professor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weld: Rethinking the Interface Between Data-Intensive Applications

Data analytics applications combine multiple functions from different libraries and frameworks. Even when each function is optimized in isolation, the performance of the combined application can be an order of magnitude below hardware limits due to extensive data movement across these functions. To address this problem, we propose Weld, a new interface between data-intensive libraries that can ...

متن کامل

Implementing a Fast Lucas-Lehmer Test on Programmable Graphics Hardware

The Lucas-Lehmer test provides a deterministic algorithm for testing whether, for a prime number p, Mp = 2−1 is also a prime number. The current work demonstrates that this test can be effectively implemented on a parallel graphics processing unit (GPU). The parallelization was achieved by two main parallel methods: (1) fast multiplication using parallel Fast Fourier transforms in extended prec...

متن کامل

Weld: A Common Runtime for High Performance Data Analytics

Modern analytics applications combine multiple functions from different libraries and frameworks to build increasingly complex workflows. Even though each function may achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. To address this problem, we propose Weld, a...

متن کامل

A Common Runtime for High Performance Data Analysis

متن کامل

Data Parallel Computation on Graphics Hardware

As the programmability and performance of modern GPUs continues to increase, many researchers are looking to graphics hardware to solve problems previously performed on general purpose CPUs. In many cases, performing general purpose computation on graphics hardware can provide a significant advantage over implementations on traditional CPUs. However, if GPUs are to become a powerful processing ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Weld: Fast Data-Parallel Computation on Modern Hardware

نویسنده

چکیده

منابع مشابه

Weld: Rethinking the Interface Between Data-Intensive Applications

Implementing a Fast Lucas-Lehmer Test on Programmable Graphics Hardware

Weld: A Common Runtime for High Performance Data Analytics

A Common Runtime for High Performance Data Analysis

Data Parallel Computation on Graphics Hardware

عنوان ژورنال:

اشتراک گذاری